<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>开始处理各种语言 | Elasticsearch: 权威指南 | Elastic</title>
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
</head>
<body>
<div class="main-container">
    <section id="content">
        
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原文地址: <a href="https://www.elastic.co/guide/cn/elasticsearch/guide/current/language-intro.html" rel="nofollow">https://www.elastic.co/guide/cn/elasticsearch/guide/current/language-intro.html</a>, 版权归 www.elastic.co 所有<br/>
                            英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/guide/current/language-intro.html" rel="nofollow">https://www.elastic.co/guide/en/elasticsearch/guide/current/language-intro.html</a>
                            </div>
                        <!-- start body -->
                  <div class="page_header">
<b>请注意:</b><br>本书基于 Elasticsearch 2.x 版本，有些内容可能已经过时。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch: 权威指南</a></span>
»
<span class="breadcrumb-link"><a href="languages.html">处理人类语言</a></span>
»
<span class="breadcrumb-node">开始处理各种语言</span>
</div>
<div class="navheader">
<span class="prev">
<a href="languages.html">« 处理人类语言</a>
</span>
<span class="next">
<a href="using-language-analyzers.html">使用语言分析器 »</a>
</span>
</div>
<div class="chapter">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="language-intro"></a>开始处理各种语言<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elasticsearch-cn/elasticsearch-definitive-guide/edit/cn/200_Language_intro/00_Intro.asciidoc">edit</a>
</h2>
</div></div></div>
<p>Elasticsearch 为很多世界流行语言提供良好的、简单的、开箱即用的语言分析器集合：</p>
<p>阿拉伯语、亚美尼亚语、巴斯克语、巴西语、保加利亚语、加泰罗尼亚语、中文、捷克语、丹麦、荷兰语、英语、芬兰语、法语、加里西亚语、德语、希腊语、北印度语、匈牙利语、印度尼西亚、爱尔兰语、意大利语、日语、韩国语、库尔德语、挪威语、波斯语、葡萄牙语、罗马尼亚语、俄语、西班牙语、瑞典语、土耳其语和泰语。</p>
<p>这些分析器承担以下四种角色：</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<p>文本拆分为单词：</p>
<p><code class="literal">The quick brown foxes</code> → [ <code class="literal">The</code>, <code class="literal">quick</code>, <code class="literal">brown</code>, <code class="literal">foxes</code>]</p>
</li>
<li class="listitem">
<p>大写转小写：</p>
<p><code class="literal">The</code> → <code class="literal">the</code></p>
</li>
<li class="listitem">
<p>移除常用的 <em>停用词</em>：</p>
<p>[ <code class="literal">The</code>, <code class="literal">quick</code>, <code class="literal">brown</code>, <code class="literal">foxes</code>] → [ <code class="literal">quick</code>, <code class="literal">brown</code>, <code class="literal">foxes</code>]</p>
</li>
<li class="listitem">
<p>将变型词（例如复数词，过去式）转化为词根：</p>
<p><code class="literal">foxes</code> → <code class="literal">fox</code></p>
</li>
</ul>
</div>
<p>为了更好的搜索性，每个语言的分析器提供了该语言词汇的具体转换规则：</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<p><code class="literal">英语</code> 分析器移除了所有格 <code class="literal">'s</code></p>
<p><code class="literal">John's</code> → <code class="literal">john</code></p>
</li>
<li class="listitem">
<p><code class="literal">法语</code> 分析器移除了 <em>元音省略</em> 例如 <code class="literal">l'</code> 和 <code class="literal">qu'</code> 和 <em>变音符号</em> 例如 <code class="literal">¨</code> 或  <code class="literal">^</code> ：</p>
<p><code class="literal">l'église</code> → <code class="literal">eglis</code></p>
</li>
<li class="listitem">
<p><code class="literal">德语</code> 分析器规范化了切词， 将切词中的 <code class="literal">ä</code> 和 <code class="literal">ae</code> 替换为 <code class="literal">a</code> ， 或将
<code class="literal">ß</code> 替换为 <code class="literal">ss</code> ：</p>
<p><code class="literal">äußerst</code> → <code class="literal">ausserst</code></p>
</li>
</ul>
</div>






</div>
<div class="navfooter">
<span class="prev">
<a href="languages.html">« 处理人类语言</a>
</span>
<span class="next">
<a href="using-language-analyzers.html">使用语言分析器 »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>